23.13 Data Analysis
333
Metabolic microarrays operate on the same principle as other kinds of microarrays
(Sect. 18.1) in which large numbers of small molecules are synthesized, typically
using combinatorial or other chemistry for generating high diversity. The array is then
exposed to the target, whose components of interest are usually labelled (although
their chemical diversity makes this more difficult than in the case of nucleic acids, for
example; moreover, the small size of metabolites makes it more likely that the label
chemically perturbs them). This technique can be used to answer questions such as
“to which metabolite(s) does macromolecule X bind?”
Much ingenuity is currently being applied to determine spatial variations in
selected metabolites. An example of a method developed for that purpose is PEB-
BLES (probes encapsulated by biologically localized embedding): fluorescent dyes,
entrapped inside larger cage molecules, which respond (i.e., change their fluo-
rescence) to certain ions or molecules. Their spatial location in the cell can be
mapped using fluorescence microscopy. Another example is the development of
high-resolution scanning secondary ion mass spectrometry (“nanoSIMS”), whereby
a focused ion beam (usually CsSuperscript plus+ or OSuperscript minus−) is scanned across a (somewhat conducting)
sample and the secondary ions released from the sample are detected mass spec-
trometrically with a spatial resolution of some tens of nanometres. This method is
very favourable for certain metal ions, which can be detected at mole fractions of
as little as 10 Superscript negative 610−6. If biomolecules are to be detected, it is advantageous to mark the
molecule or molecules of interest by enriching them with rare but stable isotopes
of their constituent atoms (e.g., Superscript 1515N, whose natural abundance is typically less than
1%); the marked molecules can then easily be distinguished via the masses of their
fragments in the mass spectrometer. It is usually safe to assume that the physiological
effect of such marking is small. 40
As far as whole bodies are concerned, the blood is an extremely valuable organ
to analyse, since its composition sensitively depends on the state of the organism, to
the extent that blood is sometimes called “the sentinel of the body”.
23.13 Data Analysis
The first task in metabonomics is typically to correlate the presence of metabolites
with gene expression. One is therefore trying to correlate two datasets, each con-
taining hundreds of points, with each other. This in essence is a problem of pattern
recognition (Sect. 13.1). There are two categories of algorithms used for this task:
unsupervised and supervised.
The unsupervised techniques determine whether there is any intrinsic clustering
within the dataset. Initial information is given as object descriptions, but the classes
to which the objects belong are not known beforehand. A widely used unsupervised
technique is principal component analysis (PCA, see Sect. 13.2.2). Essentially, the
original dataset is projected onto a space of lower dimension; for example, a set of
40 See Voigt and Matt (2004) for some insight into this question.